Skip to content

ci(docs): check changed markdown links on pull requests#1139

Open
13ernkastel wants to merge 1 commit intoNVIDIA:mainfrom
13ernkastel:codex/issue-552-docs-link-checker
Open

ci(docs): check changed markdown links on pull requests#1139
13ernkastel wants to merge 1 commit intoNVIDIA:mainfrom
13ernkastel:codex/issue-552-docs-link-checker

Conversation

@13ernkastel
Copy link
Copy Markdown

@13ernkastel 13ernkastel commented Mar 31, 2026

Summary

  • add a lightweight pull request workflow that runs the existing docs link checker on changed markdown files
  • improve check-docs.sh output so broken local links include the source line number
  • stabilize the checker's parsing locale so it behaves cleanly across environments
  • add a focused Vitest file that covers broken links and fenced-code exclusions

Why

Issue #552 asks for markdown link checking in CI. The repo already had a useful checker in test/e2e/e2e-cloud-experimental/check-docs.sh, but it only ran in broader E2E contexts and its broken-link output did not point back to the exact markdown line.

This keeps the fix small by reusing the existing checker instead of introducing a second link-checking tool. The pull request workflow runs --local-only on changed markdown files so review-time checks stay fast and avoid flaky network-driven failures.

Validation

  • bash -n test/e2e/e2e-cloud-experimental/check-docs.sh
  • ruby -e 'require "yaml"; puts YAML.load_file(".github/workflows/docs-links-pr.yaml")["name"]'
  • npx vitest run test/check-docs-links.test.js
  • bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-links --local-only README.md

Closes #552.

Summary by CodeRabbit

  • New Features

    • Added a "Docs Links PR" workflow to run link checks on changed Markdown in pull requests.
  • Tests

    • Added tests verifying local Markdown link checking, reporting broken links with source context, and ignoring links inside fenced code blocks.
  • Chores

    • Improved link-checking to skip fenced code blocks, include source line numbers in diagnostics, and ensure locale-stable extraction.

Signed-off-by: 13ernkastel [email protected]

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1f2acbee-4837-4587-969c-045fb1f6eb76

📥 Commits

Reviewing files that changed from the base of the PR and between 8d724c5 and c059875.

📒 Files selected for processing (3)
  • .github/workflows/docs-links-pr.yaml
  • test/check-docs-links.test.js
  • test/e2e/e2e-cloud-experimental/check-docs.sh
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/docs-links-pr.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/check-docs-links.test.js
  • test/e2e/e2e-cloud-experimental/check-docs.sh

📝 Walkthrough

Walkthrough

Adds a PR-triggered GitHub Actions workflow that runs a local markdown link checker on changed .md files; the checker now emits source line numbers and skips both backtick and tilde fenced code blocks; new Vitest tests validate detection and exclusions.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow
/.github/workflows/docs-links-pr.yaml
New workflow triggered on PRs to main for .md changes; computes changed markdown files (filters out deleted and ignored paths), exports has_files and the file list, and conditionally runs the link-check script with --only-links --local-only.
Link checking script
test/e2e/e2e-cloud-experimental/check-docs.sh
Set LC_ALL=C for deterministic text processing; replace naive extractor with a stateful fence-aware parser supporting variable-length backtick and tilde fences; extractor now emits line_no<TAB>target; check_local_ref and run_links_check updated to accept and report source line numbers.
Tests
test/check-docs-links.test.js
New Vitest suite that runs check-docs.sh --only-links --local-only against temp Markdown fixtures; asserts broken local links are reported with source file and line number and verifies links inside varied fenced-code scenarios (backtick, tilde, malformed closers) are ignored.

Sequence Diagram(s)

sequenceDiagram
    participant PR as Pull Request
    participant GHA as GitHub Actions (docs-links-pr)
    participant Script as check-docs.sh
    participant FS as Repository File System
    participant CI as CI Result

    PR->>GHA: opened/reopened/synchronize with .md changes
    GHA->>GHA: git diff base...head -> filter and list .md files
    alt markdown files present
        GHA->>Script: run with --only-links --local-only and file paths
        Script->>Script: parse files, skip fenced code, emit line_no<TAB>target
        loop per extracted target
            Script->>FS: check target exists
            FS-->>Script: exists / missing
        end
        alt missing targets found
            Script-->>CI: exit non-zero with "md_path:line_no -> target"
        else
            Script-->>CI: exit 0
        end
    else no markdown files
        GHA-->>CI: skip link check
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop through docs by lantern light,

I count each link, I keep them right.
Fenced code hides its whispered tunes,
I skip those nooks and flag the ruins.
CI nods — the paths are bright tonight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ci(docs): check changed markdown links on pull requests' clearly and concisely summarizes the main change: adding a CI workflow to validate markdown links in PRs.
Linked Issues check ✅ Passed The PR meets all objectives from issue #552: adds CI workflow for markdown link checking, scans changed .md files, validates relative links, reports broken links with source file and line number, skips external URLs/anchors/code blocks, excludes non-doc paths, and implements local-only mode for PR checks.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #552 objectives: the workflow file implements the CI check, the test file validates link-checking behavior including fenced-code exclusions, and the script improvements add line numbers to diagnostics and support tilde fences.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@13ernkastel 13ernkastel marked this pull request as ready for review March 31, 2026 05:54
@13ernkastel
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/e2e-cloud-experimental/check-docs.sh (1)

224-230: Handle tilde fences (~~~) in fenced-code skipping.

extract_targets currently toggles fence state only for backtick fences, so links inside ~~~ fenced blocks can still be parsed as real links. Line 225 is the toggle point to broaden.

Suggested patch
-    if (/^\s*```/) { $in = !$in; next; }
+    if (/^\s*(```|~~~)/) { $in = !$in; next; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e-cloud-experimental/check-docs.sh` around lines 224 - 230, The
fenced-code detection only toggles for backticks (the Perl one-liner in
check-docs.sh that currently uses if (/^\s*```/) to flip $in), so add tilde
fence support by changing that condition to match either ``` or ~~~ (i.e.,
update the Perl fence-toggle regex inside the extract/processing one-liner to
/^\s*(```|~~~)/ so links inside ~~~ blocks are skipped as well).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/docs-links-pr.yaml:
- Line 35: The current population of md_files captures all changed *.md
including vendored/generated paths; update the mapfile command that sets
md_files to exclude common non-doc directories by filtering out patterns like
node_modules, dist, vendor, build (e.g., modify the git diff pipeline that
produces md_files or append a grep -v -E '^(node_modules|dist|vendor|build)/'
before the sort) so md_files only contains real documentation markdown changes.

---

Nitpick comments:
In `@test/e2e/e2e-cloud-experimental/check-docs.sh`:
- Around line 224-230: The fenced-code detection only toggles for backticks (the
Perl one-liner in check-docs.sh that currently uses if (/^\s*```/) to flip $in),
so add tilde fence support by changing that condition to match either ``` or ~~~
(i.e., update the Perl fence-toggle regex inside the extract/processing
one-liner to /^\s*(```|~~~)/ so links inside ~~~ blocks are skipped as well).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47af3454-3672-41b5-a1a6-40f5430603a9

📥 Commits

Reviewing files that changed from the base of the PR and between 0086886 and 113bbf4.

📒 Files selected for processing (3)
  • .github/workflows/docs-links-pr.yaml
  • test/check-docs-links.test.js
  • test/e2e/e2e-cloud-experimental/check-docs.sh

@13ernkastel 13ernkastel force-pushed the codex/issue-552-docs-link-checker branch from bd5d7fc to 1ff2346 Compare March 31, 2026 06:16
@13ernkastel
Copy link
Copy Markdown
Author

Addressed the actionable review feedback in 1ff2346 and resolved the review conversation.

Updates included:

  • excluded node_modules/, dist/, vendor/, and build/ paths from the changed-markdown diff list
  • taught the markdown link extractor to ignore ~~~ fenced code blocks as well as backtick fences
  • added regression coverage for tilde-fenced blocks

Validation rerun:

  • bash -n test/e2e/e2e-cloud-experimental/check-docs.sh
  • ruby -e 'require \"yaml\"; puts YAML.load_file(\".github/workflows/docs-links-pr.yaml\")[\"name\"]'\n- npx vitest run test/check-docs-links.test.js\n- bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-links --local-only README.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/e2e-cloud-experimental/check-docs.sh`:
- Around line 224-225: Replace the permissive fence-toggle logic in the perl
one-liner (the /^\s*(```|~~~)/ branch that flips $in) with a delimiter-aware
handler: detect fence openings via /^\s*(`{3,}|~{3,})/ and record the fence
character and length into $fch and $flen when entering ($in=1), and only exit
the fence (set $in=0 and clear $fch/$flen) when a closing fence uses the same
character and length >= $flen; update the assignments/tuples that set ($in,
$fch, $flen) accordingly so the link-scanning next/while blocks remain gated by
the refined $in state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed0e22e1-3e2a-47bf-b23d-a297ccdf6b0a

📥 Commits

Reviewing files that changed from the base of the PR and between 113bbf4 and 1ff2346.

📒 Files selected for processing (3)
  • .github/workflows/docs-links-pr.yaml
  • test/check-docs-links.test.js
  • test/e2e/e2e-cloud-experimental/check-docs.sh
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/docs-links-pr.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/check-docs-links.test.js

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/e2e-cloud-experimental/check-docs.sh`:
- Around line 225-235: The fence-closing check using the regex
/^\s*(`{3,}|~{3,})/ incorrectly treats lines like "```not-a-close" as closes;
update the logic in the block that uses the regex and variables ($in, $fch,
$flen) so that a closing fence is only accepted if the matched fence is followed
only by optional whitespace (i.e., nothing else on the line). Concretely, either
change the regex to assert end-of-line or only whitespace after the marker, or
after matching check the remainder of the line for non-whitespace characters
before flipping ($in, $fch, $flen) to close the fence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b784f8ab-e97b-46f5-92cf-76320a8b344d

📥 Commits

Reviewing files that changed from the base of the PR and between 1ff2346 and 8af07c8.

📒 Files selected for processing (2)
  • test/check-docs-links.test.js
  • test/e2e/e2e-cloud-experimental/check-docs.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/check-docs-links.test.js

@cv cv enabled auto-merge (squash) March 31, 2026 07:20
@13ernkastel
Copy link
Copy Markdown
Author

The current head is ready from my side, but the latest required checks are still in GitHub's action_required state because this is a fork PR. A maintainer workflow approval should unblock dco-check, commit-lint, Docs Links PR, and pr for the current head (be3c965). Once those run, the PR should be mergeable.

@13ernkastel 13ernkastel requested a review from cv March 31, 2026 13:25
Includes the follow-up doc checker fixes from the original branch: locale-stable link checking, tighter markdown link filtering, and regression coverage for mixed fence delimiters and trailing text on fence closers.

Signed-off-by: 13ernkastel <[email protected]>
auto-merge was automatically disabled April 1, 2026 17:52

Head branch was pushed to by a user without write access

@13ernkastel 13ernkastel force-pushed the codex/issue-552-docs-link-checker branch from 858c0da to c059875 Compare April 1, 2026 17:52
@wscurran wscurran added documentation Improvements or additions to documentation CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. labels Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: add markdown link checker for docs and README

3 participants